× Lesson 1 Lesson 2 Lesson 3 Lesson 4 Lesson 5 Lesson 6 Lesson 7 Lesson 8 Lesson 9 Lesson 10 Lesson 11 Lesson 12 Lesson 13 Lesson 14 Lesson 15 Lesson 16 Lesson 17 Lesson 18 Lesson 19 Lesson 20 Lesson 21 Lesson 22 Lesson 23 Lesson 24 Lesson 25 Mini Lesson 1 Mini Lesson 2 Mini Lesson 3 Mini Lesson 4 Mini Lesson 5

Lessons

Lesson 23 - RegEx

The RegEx module, aka Regular Expression, main use in Python is to find patterns in text. Once again, since there are so many attributes and methods in the module, I'll be going over just a few of the main ones, and here's a link to the complete list of them.

One of the main methods is the findall() method. With the RegEx we must use a set of 'metacharacters', 'special sequences', and 'sets'. I won't go over all of them, but once again the link has a list of all of them. Let's say we want to find the amount of times a paragraph has 'Hello World' in it, we can use the findall() method.

paragraph = 'Hello World is what is commonly said when a programmer starts learning a language. Hello World is made up of two separate words. Hello World is quite a funny phrase as you are addressing the entire world. Hello World Hello World Hello World Hello World Hello World Hello World Hello World Hello World Hello World Hello World Hello World Hello World Hello World Hello World Hello World Hello World Hello World Hello World Hello World Hello World'

y = re.findall('Hello World', paragraph)

print(len(y))

So first we create a variable called paragraph which is equal to a long paragraph which has the phrase 'Hello World' in it many times. In the second line we set a new variable y equal to the re module with the findall() method, which has the phrase we want to find, 'Hello World', and the text we are searching in, which would be the variable paragraph.

Another one of the main methods is the search() method. It's purpose is just to test if there is atleast one instance of the thing you are searching for is in the text, rather than every single instance of it like in the findall() method. Let's say we wanted to check to see if the paragraph variable contains the world 'Hello', I wonder what the answer will be.

def in_text(phrase, text):

x = re.findall(phrase, text)

if x:

print(phrase + 'is in ' + text)

else:

print(phrase + 'is not in ' + text)

print(in_text('Hello World', paragraph))

We first create a function called in_text(), with two parameters phrase and text. We then use those two parameters in our findall() method, and then we use an if statement to output the phrase is in the text, and if it isn't in the text it will output this phrase isn't in the text. We can't simply output the value of the findall() function because it would output < re.Match object; span=(0, 11), match='Hello World'>